磁共振图像(MRI)被广泛用于量化前庭切片瘤和耳蜗。最近,深度学习方法显示了用于分割这些结构的最先进的性能。但是,培训细分模型可能需要目标域中的手动标签,这是昂贵且耗时的。为了克服这个问题,域的适应是一种有效的方法,可以利用来自源域的信息来获得准确的分割,而无需在目标域中进行手动标签。在本文中,我们提出了一个无监督的学习框架,以分割VS和耳蜗。我们的框架从对比增强的T1加权(CET1-W)MRI及其标签中利用信息,并为T2加权MRIS产生分割,而目标域中没有任何标签。我们首先应用了一个发电机来实现图像到图像翻译。接下来,我们从不同模型的集合中集合输出以获得最终的分割。为了应对来自不同站点/扫描仪的MRI,我们在培训过程中应用了各种“在线”增强量,以更好地捕获几何变异性以及图像外观和质量的可变性。我们的方法易于构建和产生有希望的分割,在验证集中,VS和耳蜗的平均骰子得分分别为0.7930和0.7432。
translated by 谷歌翻译
价格运动的预测旨在根据当前的市场条件和其他相关信息来预测金融资产的未来趋势。最近,机器学习(ML)方法已经变得越来越流行,并在学术界和工业中都取得了预测的有希望的结果。大多数现有的ML解决方案将预测问题作为分类(预测方向)或回归(以预测回报)问题,以期在整个培训数据集中。但是,由于财务数据的信噪比和随机性质极低,良好的交易机会极为稀缺。结果,如果没有仔细选择潜在的有利可图的样本,这种ML方法容易捕获噪声而不是真实信号的模式。为了解决这个问题,我们提出了一个新颖的价格变动预测框架,称为“地方意识到的关注和迭代精致标签”(LARA),由两个主要组成部分组成:1)局部意识 - 引起关注会自动提取潜在的有利可图的样品,以通过到周围的周围来提取。班级感知标签信息。此外,配备了公制学习技术,当地意识到的注意力享受特定于任务的距离指标,并以更有效的方式分散了对潜在有利可图的样本的关注。 2)迭代精致标签进一步迭代地完善了嘈杂样品的标签,然后结合了学到的预测因子,使其与看不见和嘈杂的样品相结合。在对三个现实世界金融市场的许多实验中:ETF,股票和加密货币,Lara与传统的时间序列分析方法和QLIB平台上的一组基于机器的竞争对手相比,取得了卓越的性能。广泛的消融研究和实验还表明,拉拉确实捕获了更可靠的交易机会。
translated by 谷歌翻译
最近,深度学习方法已经在许多医学图像分割任务中实现了最先进的表现。其中许多是基于卷积神经网络(CNN)。对于这种方法,编码器是从输入图像中提取全局和局部信息的关键部分。然后将提取的特征传递给解码器以预测分割。相比之下,最近的几部作品显示了使用变压器的卓越性能,可以更好地对远程空间依赖性进行建模并捕获低级细节。但是,对于某些任务无法有效替换基于卷积的编码器的某些任务,变形金刚作为唯一的编码器表现不佳。在本文中,我们提出了一个带有双重编码器的模型,用于3D生物医学图像分割。我们的模型是带有独立变压器编码器的U形CNN。我们融合了卷积编码器和变压器的信息,并将其传递给解码器以获得结果。我们从三个不同的挑战中评估了三个公共数据集上的方法:BTCV,MODA和DECHANLON。与在每个任务上有和没有变压器的最先进模型相比,我们提出的方法在整个方面都获得了更高的骰子分数。
translated by 谷歌翻译
这项工作考虑了嵌套形式的功能组成优化,而每个函数都包含期望。这种类型的问题是在诸如增强学习中的策略评估或元学习中的模型定制中越来越受欢迎。不能直接应用用于非复合优化的标准riemannian随机梯度方法,因为内部功能的随机近似在外部函数的梯度中造成了偏见。为了进行两级组成优化,我们提出了一个Riemannian随机成分梯度下降(R-SCGD)方法,该方法找到了一个近似的固定点,预期平方的Riemannian梯度小于$ \ epsilon $,in $ O(\ epsilon^{-2 {-2) })$调用内部功能的外部功能和随机函数的随机梯度甲骨文的呼叫。此外,我们将R-SCGD算法概括为多层嵌套组成结构的问题,对于一阶随机甲骨文而言,具有$ O(\ epsilon^{ - 2})$的复杂性相同。最后,对R-SCGD方法的性能进行了数值评估,该方法在强化学习中的策略评估问题上进行了数值评估。
translated by 谷歌翻译
多发性硬化症(MS)是一种慢性神经炎症性疾病,多模态MRIS通常用于监测MS病变。许多自动MS病变细分模型已经开发并达到了人类水平的性能。但是,大多数已建立的方法都假定在训练过程中使用的MRI模式在测试过程中也可以使用,这在临床实践中不能保证。以前,已将称为模式辍学的训练策略应用于MS病变细分,以实现最先进的性能,而缺失了模态。在本文中,我们提出了一种称为ModDrop ++的新方法,以训练统一的网络适应于任意数量的输入MRI序列。 ModDrop ++以两种关键方式升级ModDrop的主要思想。首先,我们设计一个插件动态头,并采用过滤器缩放策略来提高网络的表现力。其次,我们设计了一种共同训练策略,以利用完全模态和缺失方式之间的主体内关系。具体而言,主体内共同训练策略旨在指导动态头部在同一主题的全模式数据和缺失模式数据之间生成相似的特征表示。我们使用两个公共MS数据集来显示ModDrop ++的优势。源代码和训练有素的模型可在https://github.com/han-liu/moddropplusplus上获得。
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Representing and synthesizing novel views in real-world dynamic scenes from casual monocular videos is a long-standing problem. Existing solutions typically approach dynamic scenes by applying geometry techniques or utilizing temporal information between several adjacent frames without considering the underlying background distribution in the entire scene or the transmittance over the ray dimension, limiting their performance on static and occlusion areas. Our approach $\textbf{D}$istribution-$\textbf{D}$riven neural radiance fields offers high-quality view synthesis and a 3D solution to $\textbf{D}$etach the background from the entire $\textbf{D}$ynamic scene, which is called $\text{D}^4$NeRF. Specifically, it employs a neural representation to capture the scene distribution in the static background and a 6D-input NeRF to represent dynamic objects, respectively. Each ray sample is given an additional occlusion weight to indicate the transmittance lying in the static and dynamic components. We evaluate $\text{D}^4$NeRF on public dynamic scenes and our urban driving scenes acquired from an autonomous-driving dataset. Extensive experiments demonstrate that our approach outperforms previous methods in rendering texture details and motion areas while also producing a clean static background. Our code will be released at https://github.com/Luciferbobo/D4NeRF.
translated by 谷歌翻译
Deploying reliable deep learning techniques in interdisciplinary applications needs learned models to output accurate and ({even more importantly}) explainable predictions. Existing approaches typically explicate network outputs in a post-hoc fashion, under an implicit assumption that faithful explanations come from accurate predictions/classifications. We have an opposite claim that explanations boost (or even determine) classification. That is, end-to-end learning of explanation factors to augment discriminative representation extraction could be a more intuitive strategy to inversely assure fine-grained explainability, e.g., in those neuroimaging and neuroscience studies with high-dimensional data containing noisy, redundant, and task-irrelevant information. In this paper, we propose such an explainable geometric deep network dubbed as NeuroExplainer, with applications to uncover altered infant cortical development patterns associated with preterm birth. Given fundamental cortical attributes as network input, our NeuroExplainer adopts a hierarchical attention-decoding framework to learn fine-grained attentions and respective discriminative representations to accurately recognize preterm infants from term-born infants at term-equivalent age. NeuroExplainer learns the hierarchical attention-decoding modules under subject-level weak supervision coupled with targeted regularizers deduced from domain knowledge regarding brain development. These prior-guided constraints implicitly maximizes the explainability metrics (i.e., fidelity, sparsity, and stability) in network training, driving the learned network to output detailed explanations and accurate classifications. Experimental results on the public dHCP benchmark suggest that NeuroExplainer led to quantitatively reliable explanation results that are qualitatively consistent with representative neuroimaging studies.
translated by 谷歌翻译